Add value_counts. #63

ueshin · 2019-04-08T07:49:32Z

No description provided.

ueshin · 2019-04-08T07:49:55Z

thunterdb

@ueshin just a small question that can be solved by a comment. Feel free to merge the pull request after adding that comment. Great work!

thunterdb · 2019-04-08T07:51:32Z

databricks/koala/structures.py

+            raise NotImplementedError("value_counts currently does not support bins")
+
+        if dropna:
+            df_dropna = self.to_dataframe().filter(self._spark_isNotNull())


is this enough to filter not only the null but also the NaNs?

Not for this PR, but I am also marking this as something to cover in the guide in #34 .

@thunterdb updated to check the NaNs.

thunterdb

@ueshin a small refacting comment. This method is going to be very useful!

thunterdb · 2019-04-08T09:38:28Z

databricks/koala/structures.py

+
+        if dropna:
+            if isinstance(self.schema[self.name].dataType, (FloatType, DoubleType)):
+                pred = ~(self._spark_isNull() | F._spark_isnan(self))


You just implemented Series.isnull(). This is fine, we can refactor this piece later. I made an issue to do that later.

thunterdb · 2019-04-08T09:42:10Z

databricks/koala/structures.py

@@ -534,10 +560,14 @@ def dropna(self, axis=0, how='any', thresh=None, subset=None, inplace=False):
            else:
                columns = list(self.columns)

+            def pred(c):


Since you are repeating this logic twice already, would you mind going the whole way and defining PandasLikeSeries.inull? We can add the tests later when fully implementing #64

ueshin · 2019-04-09T02:51:51Z

databricks/koala/structures.py

+
+    isnull = isna
+
+    def notna(self):


I will add a doc here when working on #64.

ueshin · 2019-04-09T03:02:58Z

databricks/koala/structures.py

@@ -260,6 +260,17 @@ def to_dataframe(self):
    def toPandas(self):
        return _col(self.to_dataframe().toPandas())

+    def isna(self):


thunterdb

@ueshin this is much cleaner now, thanks. Merging

Add value_counts.

74c7733

thunterdb reviewed Apr 8, 2019

View reviewed changes

Fix.

bf0cc49

ueshin force-pushed the value_counts branch from 293175a to bf0cc49 Compare April 8, 2019 09:29

thunterdb reviewed Apr 8, 2019

View reviewed changes

Use notna().

5cdd5f0

ueshin commented Apr 9, 2019

View reviewed changes

databricks/koala/structures.py

isnull = isna

def notna(self):

Copy link

Collaborator Author

ueshin Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a doc here when working on #64.

Fix.

5442b33

ueshin force-pushed the value_counts branch from 922c3f1 to 5442b33 Compare April 9, 2019 03:02

ueshin commented Apr 9, 2019

View reviewed changes

thunterdb reviewed Apr 9, 2019

View reviewed changes

thunterdb approved these changes Apr 9, 2019

View reviewed changes

thunterdb merged commit a5a0627 into databricks:master Apr 9, 2019

ueshin deleted the value_counts branch April 9, 2019 05:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add value_counts. #63

Add value_counts. #63

ueshin commented Apr 8, 2019

ueshin commented Apr 8, 2019

thunterdb left a comment

thunterdb Apr 8, 2019

thunterdb Apr 8, 2019

ueshin Apr 8, 2019

thunterdb left a comment

thunterdb Apr 8, 2019

thunterdb Apr 8, 2019

ueshin Apr 9, 2019

ueshin Apr 9, 2019

thunterdb left a comment

Add value_counts. #63

Add value_counts. #63

Conversation

ueshin commented Apr 8, 2019

ueshin commented Apr 8, 2019

thunterdb left a comment

Choose a reason for hiding this comment

thunterdb Apr 8, 2019

Choose a reason for hiding this comment

thunterdb Apr 8, 2019

Choose a reason for hiding this comment

ueshin Apr 8, 2019

Choose a reason for hiding this comment

thunterdb left a comment

Choose a reason for hiding this comment

thunterdb Apr 8, 2019

Choose a reason for hiding this comment

thunterdb Apr 8, 2019

Choose a reason for hiding this comment

ueshin Apr 9, 2019

Choose a reason for hiding this comment

ueshin Apr 9, 2019

Choose a reason for hiding this comment

thunterdb left a comment

Choose a reason for hiding this comment